Kyounggu Yeo | Data Analyst
Trees are an essential part of our environment and play a crucial role in maintaining a healthy and sustainable city. They are not just an aesthetically pleasing addition to our concrete jungle, but also provide a range of benefits that make our lives better.
Street trees provide the city with a wide range of benefits: a friendly environment and a clean atmosphere for humans and animals. The city should compose a habitable environment for the trees. Providing the conditions to grow healthy, long-lived trees is the best approach to minimize the conflicts between trees and the surrounding urban infrastructure. (Ely, Martin, 2009)
In this analysis, we will focus on street trees in the City of Vancouver and examine their distribution by type, location, and other characteristics. By doing so, we hope to gain insight into the level of tree diversity in different areas of the city and the implications for urban planning and development.
The data for this analysis is obtained from the City of Vancouver's Open Data Portal. The dataset contains information about street trees in the City of Vancouver including the location, species, diameter, and characteristics of the trees. Please follow the link below to access the dataset. City of Vancouver Open Data Portal > Streets and Transportation > Street trees.
UBC Data Science faculty has done of wrangling and cleaning for the origianl dataset, and provided with a modified version of the dataset. The subset of the original data may or may not be a representative sample of the original data set.
The street_trees
dataset is a table composed of 21 columns, Unnamed: 0
, std_street
, on_street
, species_name
, neighbourhood_name
, date_planted
, diameter
, street_side_name
, genus_name
, assigned
, civic_number
, plant_area
, curb
, tree_id
, height_range_id
, on_street_block
, cultivar_name
, root_barrier
, latitude
, longitude
stored in a .csv file.
The schema of the dataset is represented by the following columns:
Data visualization methods and techniques were utilized in this analysis to analyze the information within the street_trees
dataset.
The first step in our analysis is to gather data on street trees in Vancouver. Fortunately, the City of Vancouver has made this information publicly available through its Open Data Portal. We collected data on street trees across the city, including their location, species, diameter, and other characteristics.
# Import all the required libraries needed for EDA
import pandas as pd
import numpy as np
import altair as alt
#!conda install -c conda-forge folium=0.5.0 --yes
import folium
# Import the street trees dataset
street_trees = pd.read_csv("https://raw.githubusercontent.com/UBC-MDS/data_viz_wrangled/main/data/Trees_data_sets/small_unique_vancouver.csv").dropna()
street_trees.head()
Unnamed: 0 | std_street | on_street | species_name | neighbourhood_name | date_planted | diameter | street_side_name | genus_name | assigned | ... | plant_area | curb | tree_id | common_name | height_range_id | on_street_block | cultivar_name | root_barrier | latitude | longitude | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 12573 | W 18TH AV | W 18TH AV | CALLERYANA | Arbutus-Ridge | 1992-02-04 | 6.0 | ODD | PYRUS | N | ... | 7 | Y | 129645 | CHANTICLEER PEAR | 2 | 2300 | CHANTICLEER | N | 49.256350 | -123.158709 |
3 | 8856 | DOMAN ST | DOMAN ST | AMERICANA | Killarney | 1999-11-12 | 11.0 | EVEN | FRAXINUS | N | ... | 7 | Y | 180803 | AUTUMN APPLAUSE ASH | 4 | 6900 | AUTUMN APPLAUSE | N | 49.220839 | -123.036721 |
5 | 17458 | BUTE ST | BUTE ST | PERSICA | West End | 2012-04-05 | 3.0 | EVEN | PARROTIA | N | ... | C | Y | 233622 | VANESSA PERSIAN IRONWOOD | 1 | 1100 | VANESSA | N | 49.281906 | -123.133076 |
9 | 28279 | MATAPAN CRESCENT | MATAPAN CRESCENT | ZUMI | Renfrew-Collingwood | 2008-03-13 | 3.0 | ODD | MALUS | N | ... | 12 | Y | 102612 | REDBUD CRABAPPLE | 1 | 3200 | CALOCARPA | Y | 49.257272 | -123.030023 |
10 | 1684 | KINGSWAY | KINGSWAY | SYLVATICA | Kensington-Cedar Cottage | 2009-11-06 | 3.0 | MED | FAGUS | N | ... | 8 | Y | 228772 | DAWYCK'S BEECH | 1 | 1500 | DAWYCKII | N | 49.248839 | -123.073073 |
5 rows × 21 columns
It's important to clean the data. This includes checking for missing values, correcting errors, and removing irrelevant columns.
# Information of the data frame in the dataset
street_trees.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 1653 entries, 1 to 4997 Data columns (total 21 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Unnamed: 0 1653 non-null int64 1 std_street 1653 non-null object 2 on_street 1653 non-null object 3 species_name 1653 non-null object 4 neighbourhood_name 1653 non-null object 5 date_planted 1653 non-null object 6 diameter 1653 non-null float64 7 street_side_name 1653 non-null object 8 genus_name 1653 non-null object 9 assigned 1653 non-null object 10 civic_number 1653 non-null int64 11 plant_area 1653 non-null object 12 curb 1653 non-null object 13 tree_id 1653 non-null int64 14 common_name 1653 non-null object 15 height_range_id 1653 non-null int64 16 on_street_block 1653 non-null int64 17 cultivar_name 1653 non-null object 18 root_barrier 1653 non-null object 19 latitude 1653 non-null float64 20 longitude 1653 non-null float64 dtypes: float64(3), int64(5), object(13) memory usage: 284.1+ KB
The second step in the analysis is to examine the basic statistics of the dataset, such as the count, mean, and median. This will give us an idea about the overall distribution of the trees in the city. For example, we can calculate the number of trees, the average diameter of the trees, and the range of most highest trees.
# The description of the data
street_trees.describe()
Unnamed: 0 | diameter | civic_number | tree_id | height_range_id | on_street_block | latitude | longitude | |
---|---|---|---|---|---|---|---|---|
count | 1653.000000 | 1653.000000 | 1653.000000 | 1653.000000 | 1653.000000 | 1653.000000 | 1653.000000 | 1653.000000 |
mean | 14689.987296 | 6.004864 | 3106.491833 | 174940.944949 | 1.736237 | 3146.400484 | 49.245937 | -123.101320 |
std | 8778.890359 | 3.880239 | 2155.172608 | 59039.081620 | 0.887429 | 2183.920864 | 0.021302 | 0.048578 |
min | 2.000000 | 1.000000 | 3.000000 | 616.000000 | 0.000000 | 0.000000 | 49.203531 | -123.220360 |
25% | 6907.000000 | 3.000000 | 1365.000000 | 150798.000000 | 1.000000 | 1400.000000 | 49.228664 | -123.137163 |
50% | 14579.000000 | 4.500000 | 2728.000000 | 184692.000000 | 2.000000 | 2700.000000 | 49.245210 | -123.099031 |
75% | 22253.000000 | 8.000000 | 4472.000000 | 222708.000000 | 2.000000 | 4500.000000 | 49.262586 | -123.059325 |
max | 29979.000000 | 31.000000 | 8989.000000 | 262098.000000 | 6.000000 | 8600.000000 | 49.292581 | -123.023514 |
# The shape of the data frame
street_trees.shape
(1653, 21)
Next, we can create visualizations to explore the data further. We can use bar charts to visualize the distribution of tree species, scatter plots to examine the relationship between tree diameter and height, and heat maps to see the concentration of trees in different areas of the city.
Vancouver, British Columbia is known for its lush greenery and abundant parks. However, one of the city's most significant green assets is often overlooked: its street trees. Street trees, also known as boulevard trees, are those that are planted on the public right-of-way between the sidewalk and the curb. These trees play a vital role in providing shade, reducing urban heat islands, and enhancing the overall aesthetic of the city.
Now, we'll dive into the street_trees
dataset and explore the varieties of trees that line Vancouver's streets.
# Aggregate data and create a bar chart
top_species = (
alt.Chart(street_trees)
.transform_aggregate(
count='count()',
groupby=['species_name']
)
.transform_window(
rank='rank(count)',
sort=[alt.SortField('count', order='descending')]
)
.transform_filter(
alt.datum.rank <= 10
)
.mark_bar()
.encode(
y=alt.Y('species_name:N', sort='-x'),
x='count:Q',
color=alt.Color('species_name:N', scale=alt.Scale(scheme='category10'), sort=['PLATANOIDES','RUBRUM','CERASIFERA','FREEMANI X','BETULUS','SYLVATICA','X YEDOENSIS','CALLERYANA','SERRULATA','AMERICANA']),
)
.properties(title="Top 10 tree species in the City of Vancouver")
)
# Show chart
top_species
The top ten tree species in Vancouver are: PLATANOIDES
,RUBRUM
,CERASIFERA
,FREEMANI X
,BETULUS
,SYLVATICA
,X YEDOENSIS
,CALLERYANA
,SERRULATA
,AMERICANA
. These tree species have been selected based on their abundance and prevalence in the city.
species_names = ['PLATANOIDES', 'RUBRUM', 'CERASIFERA', 'FREEMANI X', 'BETULUS', 'SYLVATICA', 'X YEDOENSIS', 'CALLERYANA', 'SERRULATA', 'AMERICANA']
top10_tree = street_trees.loc[street_trees['species_name'].isin(species_names)]
top10_tree.head()
Unnamed: 0 | std_street | on_street | species_name | neighbourhood_name | date_planted | diameter | street_side_name | genus_name | assigned | ... | plant_area | curb | tree_id | common_name | height_range_id | on_street_block | cultivar_name | root_barrier | latitude | longitude | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 12573 | W 18TH AV | W 18TH AV | CALLERYANA | Arbutus-Ridge | 1992-02-04 | 6.0 | ODD | PYRUS | N | ... | 7 | Y | 129645 | CHANTICLEER PEAR | 2 | 2300 | CHANTICLEER | N | 49.256350 | -123.158709 |
3 | 8856 | DOMAN ST | DOMAN ST | AMERICANA | Killarney | 1999-11-12 | 11.0 | EVEN | FRAXINUS | N | ... | 7 | Y | 180803 | AUTUMN APPLAUSE ASH | 4 | 6900 | AUTUMN APPLAUSE | N | 49.220839 | -123.036721 |
10 | 1684 | KINGSWAY | KINGSWAY | SYLVATICA | Kensington-Cedar Cottage | 2009-11-06 | 3.0 | MED | FAGUS | N | ... | 8 | Y | 228772 | DAWYCK'S BEECH | 1 | 1500 | DAWYCKII | N | 49.248839 | -123.073073 |
15 | 5416 | GOTHARD ST | CLARENDON ST | PLATANOIDES | Renfrew-Collingwood | 1994-11-08 | 3.0 | EVEN | ACER | Y | ... | 8 | Y | 156303 | GLOBEHEAD NORWAY MAPLE | 2 | 4700 | GLOBOSUM | N | 49.241778 | -123.054438 |
19 | 17945 | W 12TH AV | W 12TH AV | SERRULATA | Kitsilano | 2008-03-13 | 9.0 | ODD | PRUNUS | N | ... | 20 | Y | 106587 | SHIROTAE(MT FUJI) CHERRY | 1 | 2600 | SHIROTAE | N | 49.261319 | -123.164948 |
5 rows × 21 columns
With this data, we can explore the relationship between tree diameter and height.
To study the relationship between tree diameter and height, we can make scatter plots for each of these species. Scatter plots are a useful tool to visualize the relationship between two variables. In this case, the diameter of the tree is the independent variable, and the height of the tree is the dependent variable.
# Set up chart dimensions and mark
chart_width = 700
chart_height = 500
point_size = 40
chart_title = "The characteristics of the top 10 tree species"
chart_mark = alt.MarkDef(type='circle', size=point_size)
# Set up chart encoding
chart_encoding = alt.Chart(top10_tree, width=chart_width, height=chart_height).mark_circle(size=point_size).encode(
alt.X('diameter', title='Diameter (inches)'),
alt.Y('height_range_id', title='Height range (*10 feet)'),
alt.Color('species_name', title="Species", scale=alt.Scale(scheme='category10'), sort=['PLATANOIDES','RUBRUM','CERASIFERA','FREEMANI X','BETULUS','SYLVATICA','X YEDOENSIS','CALLERYANA','SERRULATA','AMERICANA']),
tooltip=['species_name']
).properties(title=chart_title)
# Set up dropdown menu for selecting tree species
types = sorted(top10_tree['species_name'].unique())
dropdown = alt.binding_select(name='Species', options=types)
select_types = alt.selection_single(fields=['species_name'], bind=dropdown)
# Update chart encoding to include opacity condition based on dropdown selection
chart_encoding_with_selection = chart_encoding.add_selection(select_types).encode(
opacity=alt.condition(select_types, alt.value(0.7), alt.value(0.05))
)
# Render the chart
tree_point = chart_encoding_with_selection
tree_point
By examining the scatter plots, we can look for patterns or trends in the data.
Studying the relationship between tree diameter and height is important for urban forestry management. Knowing the relationship between these two variables can help urban foresters determine the appropriate pruning and maintenance schedules for different tree species.
Afterward, our interest lies in observing the tree distribution across every neighborhood in Vancouver.
# Create chart
trees_count_neighbourhood = alt.Chart(street_trees).mark_bar().encode(
x=alt.X('neighbourhood_name', sort='-y', title='Neighbourhoods'),
y=alt.Y('count()', title='Number of trees')
).properties(
title="The number of trees for each neighborhood",
width=700,
height=200
)
# Add text labels to bars
trees_count_neighbourhood_text = trees_count_neighbourhood.mark_text(
align='center', dy=-7
).encode(
text=alt.Text('count(neighbourhood_name)')
)
# Display chart with text labels
trees_count_neighbourhood + trees_count_neighbourhood_text
It has been determined that Renfrew-Collingwood area has the highest number of trees in the city.
Renfrew-Collingwood
is situated along the eastern boundary of the city, adjacent to Burnaby. Although primarily residential, the area offers convenient access to nature, including the beautiful Renfrew Ravine Park featuring a natural creek within the Still Creek watershed. The residents of Renfrew-Collingwood also have easy access to a wide range of services and amenities, particularly along the Collingwood stretch of Kingsway. (Renfrew-Collingwood | City of Vancouver).
As the dataset includes location information for the trees, we can perform a spatial analysis to see the distribution of the trees across the city. This can also help identify areas that are lacking in tree coverage and where new plantings may be needed.
One of the simplest ways to visualize geospatial data is by creating a point map. The idea is to mark a point on the map for each area that corresponds to a street tree.
# Set the map location and GeoJSON file
vancouver_location = [49.246292, -123.116226]
geo_json = "https://raw.githubusercontent.com/blackmad/neighborhoods/master/vancouver.geojson"
# Create the map object
trees_map = folium.Map(
location=vancouver_location,
zoom_start=12,
max_zoom=14,
min_zoom=12,
tiles="Stamenterrain"
)
# Create the choropleth layer for neighborhood tree counts
data_choropleth = street_trees['neighbourhood_name'].value_counts().reset_index().rename(columns={"index": "neighbourhood", "neighbourhood_name": "count"})
trees_map.choropleth(
geo_data=geo_json,
name="choropleth",
data=data_choropleth,
columns=['neighbourhood', 'count'],
key_on="feature.properties.name",
fill_color="YlGn",
fill_opacity=0.5,
line_opacity=0.4,
legend_name="Number of Trees (Count)"
)
# Create markers for each tree
for _, row in street_trees.iterrows():
icon_color = None
# Set the icon color based on the tree species
if row["species_name"] == "PLATANOIDES":
icon_color = "#17becf"
elif row["species_name"] == "RUBRUM":
icon_color = "#ffbb78"
elif row["species_name"] == "CERASIFERA":
icon_color = "#c49c94"
elif row["species_name"] == "FREEMANI X":
icon_color = "#9edad5"
elif row["species_name"] in ("BETULUS", "SYLVATICA"):
icon_color = "#d62728"
elif row["species_name"] == "X YEDOENSIS":
icon_color = "#bcbd22"
elif row["species_name"] in ("CALLERYANA", "SERRULATA"):
icon_color = "#ff9896"
elif row["species_name"] == "AMERICANA":
icon_color = "#ff7f0e"
# Add a circle marker for the tree
folium.CircleMarker(
location=[row["latitude"], row["longitude"]],
radius=1,
popup=row['species_name'] + ' on ' + row['on_street'] + ' in ' + row['neighbourhood_name'],
color=icon_color
).add_to(trees_map)
# Display the map
trees_map
In conclusion, the top ten tree species in Vancouver are identified as PLATANOIDES, RUBRUM, CERASIFERA, FREEMANI X, BETULUS, SYLVATICA, X YEDOENSIS, CALLERYANA, SERRULATA, and AMERICANA, based on their abundance in the city. No significant correlation has been found between tree diameter and height. The Renfrew-Collingwood area has the highest number of trees.
The street_trees dataset provides valuable insights into the diversity of tree species in Vancouver and highlights the importance of maintaining and expanding the city's urban forest. Maintaining the health of Vancouver's street trees is crucial in mitigating the impacts of climate change and improving the quality of life for its residents. Continuous monitoring and maintenance will ensure that the benefits of street trees are enjoyed by future generations.
_ Socrates